-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROX-12921: initialize counter metrics with 0 value #720
ROX-12921: initialize counter metrics with 0 value #720
Conversation
Skipping CI for Draft Pull Request. |
e530bb3
to
a7c3046
Compare
/retest |
pkg/metrics/metrics.go
Outdated
// We do not initialize observatorium request count metric as it is unused. | ||
// We do not initialize database request count metric as it grows quickly and would not suffer from being undefined. | ||
// We initialize reconciler metrics in InitReconcilerMetricsForType. | ||
func InitOperationMetricsWithZero() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can leave out the operations metrics completely for the same reason as the database request count. Then we don't need the comment, which seems like it could go stale very fast - e.g. once we remove the Observatorium code, or decide to use it after all. So I would propose to just do the minimum here, and only init the reconciler metrics.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ivan-degtiarenko, stehessel The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Description
The focus of this PR is to introduce a 0 value as a default value for "slow" counter metrics.
"Slow" here means that the counter metric is not actively updating and can take a long time before being incremented for the first time.
The reason to do that is that to correctly use the Prometheus
rate
function in the alert, a zero value should be defined.rate
function is used to determine when the counter metric goes up.This PR does not solve the described issue for the central timeout metric. The reason for that is that the solution within this PR works only for metrics without dynamically changing labels. The issue for the timeout metric will be fixed on the alert side.
Checklist (Definition of Done)
Test manual
ROX-12345: ...
Test manual
http://localhost:8080/metrics
and see worker counter metrics and operations counter metrics:and